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In a previous paper a method was developed to subtract the interactions due to periodically 
replicated charges (or other long-range entities) in one spatial dimension. The method constitutes 
a generalized "electrostatic layer correction" (ELC) which adapts any standard 3D summation 
method to slab-like conditions. Here the implementation of the layer correction is considered in 
detail for the standard Ewald (EW3DLC) and the P3M mesh Ewald (P3MLC) methods. In 
„Cj ■ particular this method offers a strong control on the accuracy and an improved computational 

D ' complexity of 0(N log N) for mesh-based implementations. We derive anisotropic Ewald error 

I^H \ formulas and give some fundamental guidelines for optimization. A demonstration of the accuracy, 

. error formulas and computation times for typical systems is also presented. 
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INTRODUCTION 



Long-range forces, namely gravitation and electromagnetic interactions, are difficult to treat exactly in many-particle 
systems. In condensed matter, biological and solid state physics, there is considerable interest in the simulation of 
charged and polar molecules embedded in a slab in which the particles are replicated periodically in two directions. The 
applications range widely to include soap bubbles, cell membranes and electrochemistry. Surprisingly Ewald and other 
rapidly convergent methods for computing long-range interactions in infinitely periodic systems are computationally 
more demanding when the system is periodic in only one or two of the three spatial dimensions because of the breaking 
of spatial symmetry. m 

Several "two-dimensional Ewald^.|-/EW2D) methods have already been proposedEJ. The most successful is that 
introduced by ParryoO and otherscM This method can be used to obtain accurate results but is limited to small 
systems (i.e. 10 2 to 10 3 charges). This springs from the fact that the pair separations, , are no longer separable in 
the Fourier summation because of the necessary decoupling of the aperiodic z component from the periodic x and y 
components. Therefore it appears that the complexity (scaling of time with number of charges) of this method can 
' be no better than quadratic. Also the one dimensional Ewald methodB suffers from the same problem. 

A promising alternative to Ewald summation is possible using a convergence-factor techniqueQ. The basic Coulomb 
pair interaction is multiplied by one with the factor lim^oo e~ /3rijn . It is crucial to prove that the limit can be taken, 
outside of the summation and now the modified potential is more readily susceptible to analysis. In a recent study,u 
Arnold and Holm derive a two-dimensional version of this method with complexity N 5 ^ 3 along with accurate error 
formulas. This method is particularly attractive when extremely high accuracy is desired. We will use this method, 
abbreviated as MMM2D and the EW2D algorithm mentioned above as reference methods. 

One idea, used several times to study water interfaces, introduces a spatial constraint within the simulation cell. If 
the box has dimensions L x x L y x L z , the particles may have z-coordinates on [0, h), while the space within [h, L z ) 
(the gap size) remains emptylj. The primary cell is, as usual, replicated in all three directions periodically. The 
effect of this is to create a "primary layer" and an empty layer followed by an infinite array of intercalated image 
and empty layers. Figure |l| summarizes the essential geometrical considerations in this problem. While this idea 
seems trivially to be correct, it has one flaw. The flaw can be realized by attempting to reproduce the results of an 
EW2D calculation using a three dimensional Ewald summation (EW3D) as just described. One will find that the 
results always differ by a dipole- moment-jdependent constant. This fact, although at first thought to be merely a slow 
convergence issue, was noticed by Spohrti-. TJie EW3D formula contains a shape dependent dipole-term whose origin 
was mathematically proven in the 1980'sEJEj, that reflects the naturally chosen spherical order of summation in the 
conditionally convergent Coulomb sum. For the £ase of layers, however, it is necessary to use a_ slab-wise summation 
order. This was realized by Yeh and BerkowitzE^I, who applied a theory due to E. R. Smitbo for infinite crystals 
of various shapes in order to obtain the correct dipole term for a slab-wise summation order. Another complication 
arises, if the periodic supersystem is surrounded by some medium with a different dielectric constant. This medium 
has a polarizing effect on the particles in the simulation cell no matter how large the supersystem is. For the remainder 
of this paper we consider only the simple case of zero contrast for which the dielectric constant is the same inside and 
outside the supersytem. This is often referred to as the vacuum boundary condition. 

One of the advantages of using this method from a practical point of view is that any standard Ewald program 
may be used with only minor modification. It is easy to show numerically that the use of this new dipole term 
and sufficient spacing (L z — h), yields the same result as two-dimensional "brute force", MMM2D and EW2D 
summation methods. As explained in a previous paper, referred to here as "paper I"E3, errors introduced due to the 
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FIG. 1: Schematic of periodicity. 



TABLE I: Methods for slab geometry. 



method 


authors nnn 


complexity 


EW2D 


Parry; HBC;J3PtH 


N 2 


Lekner 


LekneiO 


N 2 


MMM2D 
EW3DC 


Arnold & HolmEL 
Yeh & Berkowitzbl 


TV 3 / 2 


EW3DLC 


EwaldH + ELC 
Hockney & EastwopdEl + ELC 


AT 3 / 2 


P3MLC 


TVlogiV 


MMMLC 


Sperb & StrebelD ±ELC 
Greengard & Rhokliifl + ELC 


TVlogiV 


FMMLC 


N 



image layer effects decay exponentially with, the size of the gap. This method is sometimes referred to as a corrected 
three-dimensional Ewald sum (EW3DC)EJ. Like the original Ewald method it has a complexity of TV 3 / 2 and the 
equations can be discretized onto a mesh yielding an iV(logiV) 3 / 2 complexity. In paper ItB, we derived an analytic 
electrostatic layer correction term, called ELC-term, that subtracts the interactions of the unwanted slab replicas. 
This term can be evaluated linearly in N, and has full error control. The use of the ELC term and the change 
of summation order, shortly called the ELC method, enables us to adapt any 3D summation method, such as the 
non-Ewald convergence-factorQ and multipole expansion^ methods, to slab geometries. The latter of these, though 
in possession of a slightly better scaling, has such a considerable amount of overhead that it is useful only for much 
larger systems than normally usedli3. The abbreviations and complexities of some of the two-dimensional methods 
available at present are summarized in Table |, where the ending LC always denotes the use of the ELC method. In 
this terminology the EW3D algorithm plus the ELC method is called EW3DLC and so on. 

This paper is organized as follows. In Sec. || the equations required for implementation of EW3DLC are collected 
and discussed. In Sec. Ill the standard Ewald error formulas are extended for anisotropic systems and the error 
formula for the new layer correction term is described. The methodology is demonstrated on several test-systems in 
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Sec. IV and the efficiency compared with other 2D methods is shown in Sec. Finally the results of this paper and 
the outlook for computational studies of slab geometries are summarized in Sec. [Vl]. In the Appendix we give explicit 
formulas for the ELC-term contributions to the force and the diagonal part of the stress tensor. 



II. EW3DLC IN A NUTSHELL 



The electrostatic energy of one section of a periodic slab follows from the classical formula for the Coulomb pair 
potential. Omitting the prefactor (l/47reo), 
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This serves as a useful point of introduction to our notation: N particles with charges qi and positions r$, reside 
in a simulation box of edges L x x L y x L z . The image boxes are denoted using the vector ny e {1>L X , ZL y , 0) and 
Yij = Yi — Tj. The prime on the inner summation indicates the omission of the primary box, ny = (0, 0, 0), when i — j 
(the singular case). Strictly speaking the sum over ny is ill-defined without specification of the order of summation. 
This is a consequence of the fact that summations of the type |n + c\~^ are conditionally convergent for < £ < d, 
where d is the dimensional periodicity. The summation limit of relevance is an expanding cylinder or disc such that 
the summation terms appear in order of increasing |nii|. 

We convert this into a three-dimensional problem by nesting the L x x L y x h particle volume within a larger 
L x x L y x L z box, thereby creating a screen for the ghost charges (the new image charges) as described in the 
introduction. The total electrostatic energy E is, via combination of the celebrated Ewald identity and the recent 
convergence-parameter layer correction, rewritten in the computationally convenient form: 



E = E {r) + E {k) + E (s) + E (d) + E [l 



(2) 



where the individual terms are as follows, 



N I 



BW = 

E (k) = 

E^ = 

E {d) = 



jEE^- erfcHry + n|) 



i,j=l n 
1 1 

k#0 



a \ - 2 

2tt 



fc2 



r,, + n 



4n e -k 2 /4a 2 



N 
3=1 



L x L y L z 



i 

(E * Zi ) 



(3) 

(4) 
(5) 
(6) 



8tt x ^ 

An 



tny E (-d p 



X.pqr 



LxL y /c x >o 



^2 k T (e k * L * - 1) E ^ 



An 

L xLy - n 



1 2 

^ k y {e k v L * - 1) 



1 

pqO 



^ p,q=l 



V 2 

ApOq 



(7) 



where the summations run over n S (ZL x ,ZL y ,ZL z ) and k £ 2n(Z/L x , Z/L y , Z/L z ) and are order-independent. 
As usual, the symbol a denotes the Ewald parameter which: (1) determines the radius of the Gaussian charge 
distributions, (2) has units of inverse length and (3) can take any value without changing the net result for the energy, 
but has an optimum with respect to convergence of the summations. The first three terms (Eqs. |3k p|) correspond to 
the usual Ewald real (r), Fourier (k) and self (s) contributions to, the energy. The dipole term W d > arises from the 
cylindrical or slab-wise summation limit as discussed in paper lll3 and at length in the introduction. It differs from 
the spherical version by a factor of 3 and by the replacement of the total dipole moment with its z-component. Again, 
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this dipole term applies only to vacuum boundary conditions. The final term £^ c ' is the energy subtraction due to all 
image layers. This term was derived in paper I and is shown here in an expanded O(N) form. These summations run 

over the variables k x , k y € 2n , Z/L x , 27rZ/i a and the parallel combination of these is k\\ — \J k 2 + k y . The component 

abbreviations are as follows, 

N 

Xi/2,1/2,1/2 = gi sinh / cosh(fc||^)sin/ cos{k x Xi) sin / cos(k y yi) (8) 

i=l 

and a subscript of zero stipulates omission of the corresponding trigonometric function. For example, X2,o 1 = 
1% cosh(fc||Zi) sin(fc y j/i). The corresponding equations for the forces can be obtained by differentiation of Eqs. j^-^, 
i.e. f; = -ViE. 

The derivation of the layer correction energy is a crucial part of the method so we shall recapitulate it here in brief. 
The objective is to obtain a rapidly convergent linear expression for the total energy of interaction of the particles in 
the primary box with the ghost charges: 

i N 

n z ^0 ri|| i,j=l 1 lJ IH 

It is quite similar in appearance to the original formula (Eq. [I]), and can be treated analogously by convergence factors 
as in the study by Arnold and HolmH. ■— ■ 
We have not here written explicitly the summation order but nevertheless this is an important consideration.Eil 
With the Poisson summation formula applied along each periodic direction a transformation of the in-plane variables, 
n x and n y , into Fourier conjugates k x and k y is achieved. The derivation depends heavily on the requirement that 
the system is overall neutral and results finally in 
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It is fortunate (in contrast to EW2D) that the coordinates appear as they do because after decomposition of the cos 
and cosh terms, the N 2 summation collapses, yielding the order N Eq. [?]. For reference purposes we have assembled 
in the appendix the full expressions for the force and the diagonal parts of the stress tensor due to the ELC-term. 

III. ERROR ESTIMATES 

The most important measures of error in an electrostatic calculation are the root mean squared error in the particle 
forces, 
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where Afi is the difference between the computed and exact force on particle i, and the absolute value of the error 
in the total electrostatic energy, AE. We will discuss mainly the former, which is relevant to molecular dynamics 
simulations. 

The total error can be separated into terms originating from the cutoffs used for the Ewald and layer correction, 



A/ = x /A/2 + A^ + A#, (12) 
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FIG. 2: Error distributions for EW3DLC, p(Af), and layer correction , p(A/; c ), for a 100 particle system of physical 
dimensions 1.0 x 1.0 x 0.9. Convergence parameters for EW3DLC: a = 9.3, r c = 0.36, k c = 10, L z = 2.0, A/ ic < 10~ 3 ; for 
LC: L z = 1.0, Af lc < 10" 2 . 



where the individual components are determined in the same way as in Eq. [llj Analytical, error estimates for the 
real- and reciprocal-space terms for a cubic simulation box have been previously derived.c3'E3 Shortly we will extend 
these results to obtain the formulas for arbitrary box sizes, but we write first the result, 
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where Q 2 = JD i=1 n 9i> r <= ^ s ^ ne real-space minimum image cutoff (typically but not necessarily < | min[_L x , L y , L z ]) 
and k c is an integer defining the truncation of the reciprocal-space summation. These naturally reduce to the standard 
formulas when the box is cubic. The optimum set of convergence parameters is obtained by setting Af 2 — Af 2 . This 
reduces the number of independent variables to one yielding implicit relations a(k c ), r c (k c ). Then one can easily scan 
k c to find the optimum (i.e. fastest) k*. 

If the reciprocal space computation is instead handled by FFT methods such as P3M, inaccuracies arise due to 
the assignment and de-assignment of the charges onto a mesh. The error for P3M has been estimated (for a cubic 
system) by 



Af P3M (a,M,P) 



where M is the number of mesh points pe 
are constants given by Deserno and Holm.[ 
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.edge and P is the interpolation order. The expansion coefficients, 



A more accurate numerical error estimate is also given by these authors 



15 is useful since it captures the function form of the error. 



The Ewald terms have the advantage of well-behaved error distributions. This is not the case when turning to the 
layer correction term. Consider for example the probability distribution of Af and A/; c over the ensemble of random 
particle configurations as shown in Fig. |[ The former distribution was generated under conditions of negligible 
layer correction error while the latter distribution is for layer correction error itself under conditions where it is 
significant (i.e. small gap). The distribution of the total EW3DLC error fits very well to a Gaussian curve. The 
error distribution of the layer correction term on the other hand is skewed to the left and fits well to a stretched 
exponential p(x) oc x exp(— ax). The extended tail of this distribution means that exceptions to the average error will 
occur with non-negligible frequency. This feature makes it difficult to derive an accurate RMS error estimate for the 
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layer correction. But what is possible and more useful is the derivation of a stricter error criterion, namely an upper 
bound on the RMS error. A rigorous upper bound was derived in the first paper yielding, 

Q 2 y/3 f{2wl c + 4 , 1 \ e ™° h / L 
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where l c is the integer-valued radial cutoff for truncation of the layer correction forces. 

Before leaving this topic a reminder is due of the bias present in slab geometries. Both the Ewald and layer 
correction terms have an error bias with respect to the z-position of a particle. The error tends to be higher near 
the walls of the system as if it were "hotter" in those regions. In fact, one can imagine a simulation in which the 
overall RMS error in the forces is smaller than the RMS random force but due to this bias, the RMS force-error at the 
surfaces is larger than the thermal noise, amounting to true hot zones at the surfaces. Therefore it is recommended 
to exercise some caution in the treatment of these effects. The layer correction error in particular has a strong surface 
bias, and it is therefore quite appropriate to use an upper bound in place of an RMS error estimate. In other words, 
the layer correction should be carried out to higher accuracy than the real and reciprocal space parts. 

Anisotropic Real- Space Error 

Let us write down the real-space force first, 
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This is always summed using a minimum image spherical cutoff |r mj - n | < r c regardless of the system geometry. 
Therefore the error in truncation of Eq. ^ for the force on particle i is 
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where g is an abbreviation for the term in square brackets. We square this and obtain 
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where ri,r2 are the vectors pointing from particle i to the n th images of j and k respectively. Since g is an even 
function and q\ , qi will take negative and positive values the average of this quantity over all the i particles in the 
system contains, in the limit of large, uncorrelated N, only the diagonal terms. Thus 

N 

((5f i ) 2 > = g|^g| r ^ 2 (|n|). (20) 

3 = 1 n:|ri|>r e 

Moreover in the statistical limit the summation S n is independent of particles i and j and we have 

<(**) 2 > 1/2 = \ qi \QK /2 , (21) 

where Q 2 = We will proceed by treating the summation as a spherical integral. First we write the variable 

explicitly 

|n! = ^{x + sL x y + {y + tL y ) 2 + {z + uL z y =p (22) 
where s, t, u £ Z. In Cartesian coordinates we have the following integration 
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where we have applied s = s/L x etc. 

The final step involves the application of the asymptotic integral expansion, 
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where the validity of this relation is discussed by Kolafa & Perram. We can apply this formula twice to obtain our 
mean-squared particle error; 
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Finally we can obtain the root mean squared force error 
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Neglecting the smaller a 2 term yields Eq. hj 



Anisotropic Reciprocal-Space Error 

First we write the generalized anisotropic fc-space force on particle i, 

f (k) qi sr^ 47rk ( k2 \ ■ 
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where again k S 2Tr(7*/L x ,Z/L y ,Z/L z ). 

The summation will be approximated using a cutoff r and therefore only terms satisfying 

k 2 < r 2 (29) 

will be added. This equation defines the interior of an ellipsoid whose axes are L x /2ir, etc. 

Now we compose the cutoff error for a single pair of particles, for which it will prove useful to write using complex 
notation. Using the fact the function in front of the sin is odd and that half of the vectors k are identical but opposite 
in sign to the other half we obtain 

5fij = -tj^l z £ i? cxp (-£?) 1 exp(zk • (30) 

* |k|>r 



Taking the square of this yields 

(^) 2 = - ( yfV) 2 £ E mn lt, k2 ^ +k ^ «P(*x • '«) • r«). (31) 



L x L y L z J |ki|>r|k2|>T ^2 

At this point we remind ourselves that, in spite of how it looks, this expression is indeed real and positive- valued. We 
want to obtain the average of this function over all the particle pairs in the system. In the limit of many uncorrelated 
particles the following expression is exact, 

(exp(iki • ry) exp(ik 2 • ry))^ = 5(ki,-k 2 ). (32) 
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The details of this require the observation that < s\C2 >—< sis 2 >=< c\c 2 >= for ki 7^ k 2 , 7^ — k 2 and < s 2 >=< 
c 2 > where the abbreviations S\ = sin(ki • r^) and c\ — cos(k! • r^-) have been used. Finally we reach a greatly 
simplified version of the root-mean-squared pair-wise error: 
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Now we will be somewhat explicit in treating the summation above. We rewrite the summation in terms of its 
fundamental variables and length scales, 
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Here we are taking a — 2ir 2 /a 2 = 2tt 2 / (aL) 2 , A = L x /L z , 7 = L x /L y , p 2 = I 2 + m 2 -f 2 + n 2 X 2 , l,m,n G Z and k c is 
the integer cutoff which defines an ellipsoidal solid upon Z 3 . With this notation, all of the units of the expression are 
carried by the factor L 2 . This equation can be approximated by integration; in Cartesian coordinates 
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where the limits which we do not write here, will collapse into a much simpler form. Conversion to spherical coordinates 
requires only the substitutions n = An, fh — 7m and we have 
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To summarize the results of this section, we have found 
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a result that was obtained by imposing a single-parameter ellipsoidal cutoff which makes simple spherical integration 
possible. That is to say, if we had not restricted the degrees of freedom of the cutoff object, then we would be faced 
with the daunting task of ellipsoidal integration. Two assumptions were made, namely that the function varies slowly 
enough across the grid points so that integration is valid, and secondly that the integral theorem mentioned by Kolafa 
& Perram is accurate. 

We can write the RMS error, first we have the pair-wise error approximated as 
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IV. DEMONSTRATION OF ACCURACY 



In this section we demonstrate the credibility of the EW3DLC technique by means of several standardized test 
cases that have been used in other literature. These fall into two categories: (1) two-particle systems and (2) a single 
charge elevated above a planar array of 25 charges. These cases are known to emphasize the possible errors arising 
from the non-periodicity in the z-direction. Secondly, several tests of the error formulas are carried out to see that 
they are indeed applicable to non-cubic, inhomogeneous systems. For these tests it is sufficient to use a standard 
three-dimensional Ewald summation. 

In Fig. H the results from a two-particle system are given. In this system the first particle is fixed and the second 
particle is moving in the z direction. For the remainder of this paper we will use L x = L y = L and unless otherwise 
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FIG. 3: Energy and force in a two-particle system: n = (0,0,0), v-z = (0.1, 0.1, z), qi = —1 and qz = 1. As a reference, a 
well-converged (A/ < 10 -10 ) MMM2D calculation is used. Convergence parameters: a = 5.0, k c = 10, r c = 0.5, L z = 1.5, 
Afi c < 10 -10 . (a) Energy and forces, (b) Absolute error in energy and forces. 



noted L is set equal to unity. In principle there is no limitation on the accuracy of the EW3DLC or EW3DC 
methods. The only visible errors (Fig. |a) occur for the EW3DC method as z — > 1 since the particle begins to 
approach the nearest image layer. This error can be eliminated (at some computational cost) by increasing L z , but 
we are interested in illustrating the point. 

A richer picture is obtained by plotting the errors on a semi-log scale (Fig. ||b). The EW3DLC curves are complex 
but these quirks vanish already in 3 or 4 particle systems and are more a signature of the underlying features of the 
Ewald equations. The key to the explanation of these features is the spherical minimum image cutoff which we chose 
as r c = 0.5. When z reaches 0.48, the particles are no longer in view of each other, nor are any of the images in 
view. This leads on the right side of the plot to a significant underestimate of the real-space energy and force and 
is dominated by the exponential in Eq. || The left half of the EW3DLC curves are dominated by Fourier-space 
error which is trigonometric with a period of (k c + l)L/L z due to the complex exponential in Eq. |^. Thus the 
final conclusion is that the basic two-particle curves exhibit nothing problematic in terms of the force and energy 
signatures. 

The next test-system is the 26-particle one originally used by Widmann and Adolfcl: a square array of 25 particles 
in the z — plane with coordinates (x, y, 0) where x, y € (0.1, 0.3, 0.5, 0.7, 0.9), and the 26th particle hovers above the 
center at (0.5,0.5,0.2). The values reported previously are easily reproduced using a layer correction. For example, 
setting (a, r c , k c , L z ) = (15.0, 0.40, 15, 0.80) produces —86.5655 and —10.3642 for the total energy and the z-component 
force of the 26th particle respectively. For the layer correction the the maximal pairwise error was set at 10 -6 . The 
layer correction contribution in this case is important even though the box edge is four times the layer thickness, 
amounting to —0.178234 and 0.476241 for the force and energy respectively. 

With these basic properties of the proposed algorithm established we may begin to probe its applicability to realistic 
systems. To this end we generate a "random" system of 100 unit charges (overall electrically neutral). We have followed 
the same convention as Ref.E3 so that these systems can be reproduced by other researchers. A set of 300 random 
numbers ni, ri2, . ■ . , 71300 between and 1 and the coordinates of the charges are taken as ri = (n\L, n%L, n^L z ), 
Y2 = (114L, n^L, ngLg), ... The system was checked to make sure that there are no problematically small spacings. We 
use this system to check the applicability of the anisotropic error formulas for varying shape and inhomogeneity and 
in the next section to compare the speed of various algorithms. 

For the error formulas it is sufficient to consider the standard Ewald summation (EW3D). In Fig. [|a the results 
from a standard Ewald summation are compared to the error formulas Eqs. |l3"| - |l4| . For the reference forces we also 
used Ewald summation, convergent to 8 digit accuracy in the total energy. The error formulas are satisfactory for 
both cigar- and pancake-shaped systems (i.e. L z > 1 and L z < 1 respectively). 

The second concern is the validity of the error formulas when part of the system is empty. This is tested for 10, 50 
and 100% filled systems in Fig. [|b. There is no dependence on h in the error formulas and therefore the error curves 
should all be the same. In fact there is good agreement and the most substantial deviations occur in the unimportant 
large-a region. These deviations are evidence of only a small effect of EW3DLC-type inhomogeneity on the error 
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FIG. 4: Dependence of the error estimates on the box shape (a) and the inhomogeneity (b) for standard three-dimensional 
Ewald summation in a randomized 100 particle system, (a) The aspect ratio L z /L of the box is indicated on each curve. The 
solid lines indicate computations (k c ,r c ) = (8,0.25) and the dotted lines are error estimates from Eqns. [l^-|l^. (b) The fill 
ratio h/L z is indicated on each curve. Computations (solid lines) are for (8, 0.5) and the dotted lines show the error estimates. 



formulas. 

To summarize, we have shown that the EW3DLC method produces accurate and well controlled results and can 
be used for a broad range of shape and inhomogeneity. This is an important advantage over the EW3DC method 
which is more restrictive on the selection of L z even if the error estimate that we developed in the first paper is used 
to control the error. 



V. COMPUTING TIME 



Of course the critical consideration, once issues of accuracy are established, is the speed with which a given algorithm 
can be implemented. This makes the P3MLC the most efficient choice for large N but for smaller systems other 
algorithms such as EW3DLC or MMM2D may have superior speed due to a lower overhead. This section attempts 
to give some quantitative information on the break-even points between these various algorithms. As usual, the 
relative times vary slightly with the programming efficiency and the software and hardware of the computer. All 
times in this study were collected on a Compaq XP1000 workstation. 

To optimize the time required for such a many-parameter computation, it is necessary to understand the scaling of 
time with the variables - most importantly with the number of charges, N. Here a brief, qualitative discussion of the 
minimization is made. This discussion tends to be more transparent (with the benefit of hindsight) yet less rigorous 
than more detailed versions available elsewhere. 

The computing time of the real-space part will in general require 0.5N p^Airr^/S) operations where p = N/L 2 h 
(large h) is the density. Similarly the reciprocal-space calculation scales as NL z k^. since increasing L z requires a 
proportionate increase of the z-component reciprocal space cutoff. The layer correction scales as Nl 2 since it is a 
cylindrical summation. Thus the overall time required scales as 

t elc = ai N 2 rl + a 2 NL z kl + a 3 Nl 2 c (40) 

As is evident in plots like Fig. |I[ the sharpness of the crossover between the real- and reciprocal-space accuracy 
regimes is such that the choice of a, k c and r c will be near the optimum if Af r « A/&. The layer correction cutoff 
can be determined independently from the others and moreover is only a weak function of L z for large (L z — h). 
Eq. ^ is then interpreted as a function of two variables, for instance the reciprocal cutoff and the desired accuracy 
tele = f(k C) Af). In order to neutralize the change in accuracy in Eqs. [l3] Q the Ewald parameter should scale 
inversely with the real cutoff and linearly with the Fourier cutoff, i.e. a ~ l/r c ~ k c . Thus we can strike a compromise 



or balance between the first two terms of Eq. 40 by fixing 

^max N ^ , k max ~ N ^ (41) 



11 



10' 



10 - □ □ MMM2D 

EW3DLC 

G O P3MLC 

10° - W(log/V) 3 ' 2 
ELC 



10"' 



10" 



10 




10 100 1000 10000 

N 



FIG. 5: Optimal CPU times for P3MLC, EW3DLC and MMM2D algorithms. The dotted curve displays the asymptotic 
scaling of the P3M algorithm, and the dot-dashed curve shows the CPU time needed for computing the ELC-term. 

which reduces the overall scaling of the Ewald calculation to the well-known 0(N 3 ^ 2 ). Note that the inclusion of the 
layer correction does not influence the scaling since it will only add a term linear in N to Eq. |l)|. Furthermore, it 
becomes a smaller part of the overall calculation as N increases. 

A similar analysis applies to mesh-based algorithms, of which we will deal with the P3M method. The major 
difference is that the number of operations required to perform a FFT is independent of the number of charges and 
only depends on the number of mesh points as M log M per direction in space. The other part of the mesh routine 
is the charge assignment via polynomial interpolation. The order of interpolation, P, represents the number of grid 
points in each direction that a given charge is distributed onto. Therefore it is much like a cutoff parameter and the 
number of operations per charge scales as (P + l) 3 . The time required for a P3M computation is 

t p3m lc = a t N 2 r 3 c + ai L z (M\ogMf + a 5 N(P + l) 3 + a 3 Nl 2 c (42) 

An intuitive analysis of Eq. ^| is more difficult but analogous to the preceding reasoning (and benefits from 
foreknowledge of the answer). Again we attempt to balance the ./V-exponent in the first two terms but this time a 



substantially better result is possible due to the much smaller exponent in the second term (0). From Eq. |l5j, the 
choice of a ~ M will roughly neutralize the effect of M and thus the link between the mesh size and real cutoff is 
r c ~ 1/M. The best choice for the scaling of these terms is 

r P°g*0 1/a M ^ (43 ) 

resulting in an overall scaling for the time of 0(N (log TV) 3 / 2 ). 

The prefactors in the above equations represent a mixture of "primitive" times for various computing operations 
as well as, for brevity, some unimportant physical parameters. But in general we expect a performance crossover 
which favors the Ewald method for low ./V because of a smaller overhead and eventually at large N, the P3M method 
becomes much faster. 

By generating random systems for a broad range of N it is possible to obtain an estimate for the crossover. We 
compare what are perhaps the three best algorithms for slab-like systems in Fig. | at a fixed error, Af = 10~ 2 
e 2 /AireL 2 , including the Coulomb prefactor that has been until now omitted. This level of accuracy is as a value that 
is practical for Langevin and Brownian dynamics simulations. For instance, in a typical N = 3600, p = 4.2 • 10~ 3 
polyelectrolyte simulation, the random forces are of order 2 • 10 3 times larger than the above RMS electrostatic force 
error. 

The recently introduced non-Ewald MMM2D method we recall has a theoretical complexity 0(iV 5 / 3 ). The break- 
even point between the MMM2D and P3MLC methods at this low accuracy is approximately. M MMM2D ^ P3MLC = 
50 and this would increase monotonically with increasing accuracy. For reference the dotted line in Fig. indicates 
7V(log(iV)) 3 / 2 scaling as is expected for the P3MLC method. The EW3DLC curve plotted shows two break-even 
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TABLE II: Comparison of EW3DC and EW3DLC. 





EW3DLC 


EW3DC 


system 


(a,r c ,k c , L z ) time (s) 


(a,r c ,k c ,L z ) time (s) 


pancake 
cube 
cigar 


(10.20,0.271,9,0.75) 0.266 
(8.00,0.346,7,1.5) 0.261 
(6.80,0.410,6,2.4) 0.257 


(8.30,0.321,8,2.0) 0.342 
(7.00,0.384,7,3.0) 0.356 
(6.90,0.398,7,4.0) 0.368 



points at N MMM2D - EW3DLC = 200 and N EW3DLC - p3MLC = 17. The shape of these curves for the low N region of 
the plot is dictated by the computational overhead which can be linear or constant depending on the algorithm. It is 
necessary to make Verlet lists outside of the timing loop to avoid the quadratic determination of the pair distances in 
the real-space calculation. Alternatively, these can be subtracted at the end if one accurately measure the primitive 
time of this step, as we have done. Note also that the layer correction part of the calculation used roughly 20% of the 
computation time in the scaling regime and the upper bound on its error was fixed at 10~ 2 . 

Finally we make a comparison between the speeds of EW3DLC and EW3DC. Since these methods have the 
same scaling there is only a difference in the prefactor. But the larger gap size required in EW3DC mandates an 
increase of the reciprocal space cutoff in order to maintain the same accuracy. Thus there is a computational tradeoff 
between computing the extra layer correction or expanding the region of reciprocal space summation. We have used 
three systems with different shapes and N = 1000 for the comparison. The first system has a cubic particle volume 
sized lxlxl while the other two are pancake and cigar shaped with h = 0.5 and 2.0 respectively. The parameters 
(a, r c , k c , l c , L z ) were optimized with respect to fixed RMS errors of 3.0, 4.8 and 1.9 • 10~ 2 for the pancake, cube and 
cigar respectively. Again the ELC error was chosen conservatively by setting the upper bound to 10 -2 . These values 
were chosen in order to compensate for the different volumes of these systems so that the error at the same density 
is the same for all three shapes. 

The results of this analysis are displayed in Table |3. Under these conditions, the EW3DLC method is only slightly 
(30%) faster. The reason is that much of the work in these calculations is performed in the real-space and therefore 
the extra amount of reciprocal-space required by EW3DC due to the enlarged gap is not a major issue. In fact it 
is almost compensated by the time used by layer correction term. For larger N, EW3DLC should become faster 
than EW3DC since the reciprocal-space time becomes much longer than the layer correction time. The same is true 
about higher accuracy conditions as is observable from the analytical error estimates (Eqs. |l3" 14). Moreover, in a 
mesh implementation, the same comparison should yield a more substantial advantage to the layer correction since 
more of the computational effort is expended in reciprocal space. 



VI. DISCUSSION 



We have presented and tested a method with broad applicability and low computational cost for computing the 
electrostatic forces or energy of a cell of charges with two-dimensional periodicity. This method has a complexity 
that is virtually linear when applied to a mesh, so that only minor further improvements in computation time can be 
attempted through a reduction of computational overhead. The algorithm differs from EW3DC only by the effort to 
program the ELC-term, but once this is done the errors are easier to control and there is a greater degree of freedom 
in choosing the box edges. This makes it easy to apply to cubic versions of the P3M algorithm (which was one of our 
initial motivations). For larger systems and better accuracy as may be required in Monte Carlo and some molecular 
dynamics simulations the method should be considerably faster than EW3DC. 

It is interesting to estimate the largest system tractable with modern computing. The P3M algorithm can be 
optimized to be at least twice as fast as the version presented here, so an N — 10000 calculation could be carried out 
in approximately. 2 seconds. A 100-node parallel computer with a 90% scale factor could handle a system of 0.45 
million particles in 1 second - which is sufficient for a many molecular dynamics and Monte Carlo simulations. 

Finally we can draw attention to several issues suitable for further research. Most importantly we have not treated 
the case of dissimilar dielectric materials at the slab surfaces. Because of the symmetry of the consequent charge 
reflections there could be a way to handle the problem efficiently within the current framework. Another topic is the 
clarification of the dipole term for slab- wise summation for different dielectric constants at the interface between the 
periodic supersystem and the surroundings. 
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ELC-TERM FORCES AND PRESSURE 



For quick reference the force and pressure formulas for the layer correction are given here. The layer correction 
force on particle i is 



f (Zc) = 



E** 2 E ^1 f " sin 0^j ) cos{k y yij ) 



L X Ly 



/c x ,fcj,>0 



k\Ae k w L > - 1) 



Ecosh^Zy) 



fc*>0 

\- [ fc y cosh(fc||^) 

2^*9i 2 ^ — cos(k x x ij )sm(k y y ij ) + 

^ cosh(fc y z y ) i 

fe s >0 v ' 

-J2l*U 2 E cos(fc x gi3-)cos(fe„yij-) + 

j fe !c ,/c B >0 ^ > 

' '~cZij) /7 \ i \ ^ sinh(fcj / 2;ij) 

— cos(fc x a;y) + 2^ (e fc y L, _ n cos ( fc y^) 



E 



(e 



k y >0 



(44) 



The diagonal components of the pressure tensor can be obtained from the partition function. For example the normal 
component is Pi = P zz = — , K Mf-. This yields 



Ale) 



8ir 2 N ( 

JJJ2 E m A 2 E is(k ll )cos(k x x ij )cos(k y y ij ) + 
x v i,j=i V fc x ,fc y ez : >o 

^ v(k x ) cos(k x Xij) + 
fe x >o 



E v(k y ) cos(k y yij) 



k v >0 



where 



(sinhuzy) (zij /L z )e uL * — (sinhuz i:) -) (zij/L z ) — (cosh uzij ) e uL * 

uL 



For one of the parallel components we find 

^--tSe^I 2 e 



(e uL ° - l) 2 



k 2 k 2 cosh(knzn) 



&x fey 

^ v(k x ) cos(k x Xij) + 



cos(k x Xij) cos (k y y i: j) + 



El cosh(k y Zij) \ 
,„ >0 ^P^ coste) | 



(45) 
(46) 

(47) 



(48) 
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Obviously this is cumbersome since after decomposing the trigonometric and hyperbolic terms to obtain a formula 
that can be evaluated linearly in N the expanded version will contain over 80 terms. If rapid pressure calculations 
are not demanded one can omit this term and increase L z . One third of the trace of this tensor will as usual yield 
the scalar pressure, 

Another consideration in obtaining the pressures is the treatment of the dipole term. Fortunately, a shear or 
distortion on the system does not result in a distortion in the shape of the supersystem. This is in contrast to the 
treatment of standard Ewald summation because the spherical summation order requires a complex analysis of the 
consequences of a distorted sphere.E3 
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